Skip to content

Conversation

@friendlymatthew
Copy link
Contributor

@friendlymatthew friendlymatthew commented Oct 21, 2025

Which issue does this PR close?

Rationale for this change

This PR preserves the typed_value's Field metadata. This way, we can check for extension types.

@github-actions github-actions bot added the parquet-variant parquet-variant* crates label Oct 21, 2025
Comment on lines +940 to +952
pub fn with_column(mut self, field_name: &str, array: ArrayRef, nullable: bool) -> Self {
let field = Field::new(field_name, array.data_type().clone(), nullable);
self.fields.push(Arc::new(field));
self.arrays.push(array);
self
}

pub fn with_field(mut self, field: FieldRef, array: ArrayRef) -> Self {
self.fields.push(field);
self.arrays.push(array);
self
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love the naming here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a convenience method, right?

We already have the datatype (from the array), so maybe we just need pass an optional metadata?, making this similar to the Field constructor? Or, if that's too disruptive for exiting callers, with_column_and_metadata?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about with_column_name and with_field_ref

index: usize,
) -> Variant<'a, 'a> {
let data_type = typed_value.data_type();
let (_typed_value_field, typed_value_column) = typed_value;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we have the field information, we can check for extension types like: e21dc1b

@friendlymatthew
Copy link
Contributor Author

friendlymatthew commented Oct 21, 2025

cc @scovich @alamb @klion26

Copy link
Contributor

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a stab at this. I'm not immediately sure how to react, so I just left a couple high level comments while I stew on it more.

Comment on lines 401 to +403
.typed_value_field()
.unwrap()
.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just glancing at this code in isolation, I would guess that .0 is the field and .1 is the array?
But that requires knowing context (ie of this PR)

If this usage will show up often, is it worth returning a newtype instead of a tuple?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes , please, let's return a type with real field names and documentation

Comment on lines +297 to +308
let typed_value = if let Some(typed_value_array) = typed_value.clone() {
let field_ref = Arc::new(Field::new(
"typed_value",
typed_value_array.data_type().clone(),
true,
));
builder = builder.with_field(field_ref.clone(), typed_value_array.clone());

Some((field_ref, typed_value_array))
} else {
None
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is just Option::map (no ? to complicate things)

Comment on lines +660 to +668
let typed_value = if let Some(typed_value_array) = typed_value.clone() {
let field_ref = Arc::new(Field::new(
"typed_value",
typed_value_array.data_type().clone(),
true,
));
builder = builder.with_field(field_ref.clone(), typed_value_array.clone());

Some((field_ref, typed_value_array))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because of the dual typing (one in field and one in array), this code is technically no longer infallible -- the data types could disagree. Not sure the best way to address that issue as long as we're passing a full-blown field.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this logic is duplicated with the variant array constructor above.

@friendlymatthew
Copy link
Contributor Author

Thanks for taking a stab at this. I'm not immediately sure how to react, so I just left a couple high level comments while I stew on it more.

I think one thing I'm debating is the coupling between the typed_value array and field. If field is an optional metadata, then maybe we can split array and field into separate optional fields.

Comment on lines 401 to +403
.typed_value_field()
.unwrap()
.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes , please, let's return a type with real field names and documentation

}
}

pub type TypedValue = (FieldRef, ArrayRef);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This combination of an Array (which only has a DataType) and a Field (which has the DataType and metadata information) has come up recently upstream as @paleolimbot is working extension types through DataFusion as well in

We are still looking for a good pattern to use.

I wonder if something like this is too crazy

/// A value that has associated Arrow field infomration
struct TypedValue<T> {
  value: T, // could be ArrayRef, could be ScalarValue
  field: FieldRef,
}

🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably struct TypedArrayRef { value: ArrayRef, field: FieldRef } is best for now (keeping the FieldRef-is-a-type bit opaque). Using FieldRefs as types in DataFusion has mostly led to a lot of cloned metadata and heterogeneity with respect to how the name/metadata/nullability should propagate (or not) but it does seem to be the path of least resistance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet-variant parquet-variant* crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants